{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# GoogLeNet   \n",
    "提高深度神经网络表现核心**提高尺寸**（increasing size）  \n",
    "> 1、增加模型的深度---神经网络的层数  \n",
    "> 2、增加模型的宽度---每一层神经元的数量  \n",
    "\n",
    "上述两点带来的*问题*：   \n",
    "1、**参数量的增加（larger number of parameter）**，进而可能导致过拟合发生（overfitting），就需要对样本就行改进（数量、质量）；  \n",
    "2、**计算资源的怎加（increasing computation resource）**；  \n",
    "\n",
    "**改进措施**：作者给出的建议是：从**全连接到稀疏链接**（fully connected to sparsely connected）。在数据集的概率分布可以用一个大型的、非常稀疏的深度神经网络来表示前提下，通过分析**最后一层激活的相关统计**数据逐层构建最优网络拓扑，对具有**高度相关输出的神经元进行聚类**。  \n",
    "> Their main result states that if the probability distribution of the data-set is representable by a large, very sparse deep neural network, then the optimal network topology can be constructed layer by layer by analyzing the correlation statistics of the activations of the last layer and clustering neurons with highly correlated outputs.  \n",
    "  \n",
    "**模型结构**  \n",
    "正如作者前文所提到的**increase the size**，作者不是在模型的“长度”（添加各种卷积或者全连接操作）而是在“宽度”上进行修改：  \n",
    "<div align=\"center\"><img src=\"https://s2.loli.net/2023/10/13/WtizayVJ8I6cdYO.png\" alt=\"202310131713046\" style=\"zoom:100%;\"/></div>  \n",
    "\n",
    "左侧为最初模型，右侧是在最初模型的基础上所提出的改进模型。对比 **(a)** 模型存在一个$3*3$和$5*5$卷积那么无疑增加了计算消耗（问题二）。作者提出的思路是在这些卷积层添加一个$1*1$的卷积，进而对模型的计算进行减少。   \n",
    " \n",
    "> 模型计算减少  \n",
    "> <div align=\"center\"><img src=\"https://miro.medium.com/v2/resize:fit:828/format:webp/0*zj9mU2xHnnYAN8kB.png\" alt=\"202310131713046\" style=\"zoom:70%;\"/></div>\n",
    "> 那么此时模型的计算次数为：(28x28x32)x(5x5x192)=120422400  \n",
    "> 通过添加 $1x1$ 卷积层  \n",
    "> \n",
    "> <div align=\"center\"><img src=\"https://miro.medium.com/v2/resize:fit:828/format:webp/0*e8P1In1cIa5zd3yV.png \" alt=\"202310131713046\" style=\"zoom:70%;\"/></div>\n",
    "> 那么此时的计算次数为：(28x28x16x1x1x192)+(28x28x32x5x5x16)=12,443,648\n",
    "\n",
    "那么根据上述思路，作者提出如下的模型结构图：  \n",
    "<div align='center'><img src=\"https://miro.medium.com/v2/resize:fit:828/format:webp/1*ZFPOSAted10TPd3hBQU8iQ.png\" alt=\"结构示意图\" style=\"zoom:200%;\"/></div>\n",
    "\n",
    "**整体来说**```GoogLeNet```模型最大的创新点如下：  \n",
    "1、在卷积神经网络中对于提高准确率有效的方法就是改变整体网络size，随之而来也会带来许多难题，于此通过$1*1$的卷积来减少参数；  \n",
    "2、改变整体网络长度的同时，作者提出*改变模型宽度*  \n",
    "\n",
    "论文地址: http://ieeexplore.ieee.org/document/726791/  "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "cuda\n"
     ]
    }
   ],
   "source": [
    "import torch \n",
    "import torch.nn as nn\n",
    "import numpy as np\n",
    "from typing import Optional, Tuple, Any\n",
    "device = ('cuda' if torch.cuda.is_available() else 'cpu')\n",
    "print(device)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "class InceptionModule(nn.Module):\n",
    "    \"\"\"\n",
    "    in_channel: 初始输入模型通道数量\n",
    "    out1: 第一层1x1卷积输出\n",
    "    pre_in3: 3x3卷积输入 \n",
    "    com_out3: 3x3卷积输出\n",
    "    pre_in5: 5x5卷积输入 \n",
    "    com_out5: 5x5卷积输出\n",
    "    pool_out: 池化层输出\n",
    "    \"\"\"\n",
    "    def __init__(self, in_channel, out1, pre_in3, com_out3, pre_in5, com_out5, pool_out):\n",
    "        super(InceptionModule, self).__init__()\n",
    "        self.con1 = nn.Conv2d(in_channel, out1, kernel_size= 1) # 1x1卷积层\n",
    "\n",
    "        self.con3 = nn.Sequential(\n",
    "            nn.Conv2d(in_channel, pre_in3, kernel_size= 1),\n",
    "            nn.Conv2d(pre_in3, com_out3, kernel_size=3, padding= 1),\n",
    "        ) # 1x1+3x3\n",
    "\n",
    "        self.con5 = nn.Sequential(\n",
    "            nn.Conv2d(in_channel, pre_in5, kernel_size= 1),\n",
    "            nn.Conv2d(pre_in5, com_out5, kernel_size= 5, padding= 2)\n",
    "        ) # 1x1+5x5\n",
    "\n",
    "        self.pool = nn.Sequential(\n",
    "            nn.MaxPool2d(kernel_size=3, stride=1, padding=1, ceil_mode=True),\n",
    "            nn.Conv2d(in_channel, pool_out, kernel_size= 1, stride= 1)\n",
    "        ) # max_pooling+1x1\n",
    "    def forward(self, x):                          \n",
    "        x1 = self.con1(x)\n",
    "        x3 = self.con3(x)\n",
    "        x5 = self.con5(x)\n",
    "        max_x = self.pool(x)\n",
    "        # print(x1.size(), x3.size(), x5.size(), max_x.size())\n",
    "        return torch.concatenate([x1, x3, x5, max_x], 1)\n",
    "\n",
    "class ConvBlock(nn.Module):\n",
    "    \"\"\"\n",
    "    定义一个卷积网络类型，简化后续卷积操作\n",
    "    \"\"\"\n",
    "    def __init__(self, in_channels: int, out_channels: int, **kwargs: Any) -> None:\n",
    "        super(ConvBlock, self).__init__()\n",
    "        self.conv = nn.Conv2d(in_channels, out_channels, bias=False, **kwargs)\n",
    "        self.batch_norm = nn.BatchNorm2d(out_channels, eps= 0.001)\n",
    "        self.relu = nn.ReLU(True)\n",
    "\n",
    "    def forward(self, x):\n",
    "        out = self.conv(x)\n",
    "        out = self.batch_norm(out)\n",
    "        out = self.relu(out)\n",
    "        return out\n",
    "\n",
    "class InceptionAux(nn.Module):\n",
    "    \"\"\"\n",
    "    在GoogLeNet中存在分支判断函数，因此补充分支判断函数定义\n",
    "    \"\"\"\n",
    "    def __init__(\n",
    "            self, \n",
    "            input_size: int, \n",
    "            num_classes: int= 1000, \n",
    "            dropout: float=0.7) -> None:\n",
    "        super(InceptionAux, self).__init__()\n",
    "        self.avg_pool = nn.AvgPool2d(kernel_size=5, stride=3, ceil_mode=True)\n",
    "        self.conv = ConvBlock(input_size, 128, kernel_size=1, stride=1)\n",
    "        self.fc1 = nn.Linear(2048, 1024)\n",
    "        self.fc2 = nn.Linear(1024, num_classes)\n",
    "        self.relu = nn.ReLU(True)\n",
    "        self.dropout = nn.Dropout(dropout, True)\n",
    "    def forward(self, x):\n",
    "        x = self.avg_pool(x)\n",
    "        x = self.conv(x)\n",
    "        x = torch.flatten(x, 1)\n",
    "        x = self.fc1(x)\n",
    "        x = self.relu(x)\n",
    "        x = self.dropout(x)\n",
    "        x = self.fc2(x)\n",
    "        return x"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [],
   "source": [
    "class GoogLeNet(nn.Module):\n",
    "    \"\"\"\n",
    "    input_size:输入模型图片channel\n",
    "    out_label:输出图片类型\n",
    "    \"\"\"\n",
    "    def __init__(\n",
    "            self, \n",
    "            input_size: int=3, \n",
    "            num_classes: int= 1000, \n",
    "            dropout: float = 0.4, \n",
    "            dropout_aux: float = 0.7,\n",
    "            inception_aux: bool= True) -> None:\n",
    "        super(GoogLeNet, self).__init__()\n",
    "        \n",
    "        self.inception_aux = inception_aux # 判断是否走分支\n",
    "        self.conv1 = ConvBlock(input_size, 64, kernel_size=7, stride=2, padding=3)\n",
    "        self.max_pool1 = nn.MaxPool2d(kernel_size= 3, stride=2) # 56x56x64\n",
    "        self.conv2 = ConvBlock(64, 192, kernel_size=1, stride=1) # 56x56x192\n",
    "        self.conv3 = ConvBlock(192, 192, kernel_size=3, stride=1, padding=1) # 56x56x192\n",
    "        self.max_pool2 = nn.MaxPool2d(kernel_size=3, stride=2) # 28x28x192\n",
    "\n",
    "        self.incept3a = InceptionModule(192, 64, 96, 128, 16, 32, 32) # 28x28x256\n",
    "        self.incept3b = InceptionModule(256, 128, 128, 192, 32, 96, 64) # 28x28x480\n",
    "        self.max_pool3 = nn.MaxPool2d(kernel_size=3, stride=2) # 14x14x480\n",
    "\n",
    "        self.incept4a = InceptionModule(480, 192, 96, 208, 16, 48, 64) # 14x14x512\n",
    "        # 判断\n",
    "        self.incept4b = InceptionModule(512, 160, 112, 224, 24, 64, 64) # 14x14x512\n",
    "        self.incept4c = InceptionModule(512, 128, 128, 256, 24, 64, 64) # 14x14x512\n",
    "        self.incept4d = InceptionModule(512, 112, 144, 288, 32, 64, 64) # 14x14x528\n",
    "        # 判断\n",
    "        self.incept4e = InceptionModule(528, 256, 160, 320, 32, 128, 128) # 14x14x832\n",
    "\n",
    "        self.max_pool4 = nn.MaxPool2d(kernel_size=3, stride=2, padding=1) # 7x7x832\n",
    "        self.incept5a = InceptionModule(832, 256, 160, 320, 32, 128, 128) # 7x7x832\n",
    "        self.incept5b = InceptionModule(832, 384, 192, 384, 48, 128, 128) # 7x7x1024\n",
    "\n",
    "        self.avg_pool = nn.AvgPool2d(kernel_size=7, stride=1) # 1x1x1024\n",
    "        self.linear = nn.Linear(1024, num_classes) # 1x1x1000\n",
    "        self.dropout = nn.Dropout(dropout_aux)\n",
    "\n",
    "        # 补充判断函数\n",
    "        if inception_aux:\n",
    "            self.inception_aux1 = InceptionAux(512, num_classes, dropout)\n",
    "            self.inception_aux2 = InceptionAux(528, num_classes, dropout)\n",
    "        else:\n",
    "            self.inception_aux1 = None\n",
    "            self.inception_aux2 = None\n",
    "\n",
    "    def forward(self, x):\n",
    "        x = self.conv1(x)\n",
    "        x = self.max_pool1(x)\n",
    "        x = self.max_pool2(self.conv3(self.conv2(x)))\n",
    "        x = self.incept3a(x)\n",
    "        x = self.incept3b(x)\n",
    "        x = self.max_pool3(x)\n",
    "        x = self.incept4a(x)\n",
    "\n",
    "        if self.inception_aux1 is not None:\n",
    "            aux1 = self.inception_aux1(x)\n",
    "\n",
    "        x = self.incept4b(x)\n",
    "        x = self.incept4c(x)\n",
    "        x = self.incept4d(x)\n",
    "\n",
    "        if self.inception_aux2 is not None:\n",
    "            aux2 = self.inception_aux2(x)\n",
    "\n",
    "        x = self.incept4e(x)\n",
    "        x = self.max_pool4(x)\n",
    "        x = self.incept5a(x)\n",
    "        x = self.incept5b(x)\n",
    "        x = self.avg_pool(x)\n",
    "        x = torch.flatten(x, 1)\n",
    "        x = self.dropout(x)\n",
    "        x = self.linear(x)\n",
    "        \n",
    "        return x, aux2, aux1"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [],
   "source": [
    "model = GoogLeNet().to(device)\n",
    "x = torch.randn(1, 3, 224, 224).to(device)\n",
    "y = model(x)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "base",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.7"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
